perm filename STRING.MSS[WHT,LSP] blob
sn#754073 filedate 1984-05-12 generic text, type T, neo UTF8
@Part[String, Root = "CLM.MSS"]
@Comment{Chapter of Common Lisp Manual. Copyright 1984 Guy L. Steele Jr.⎇
@MyChapter[Strings]
@index[string]
A string is a specialized vector (one-dimensional array)
whose elements are characters. Specifically, the type @f[string]
is identical to the type @f[(vector string-char)], which in turn
is the same as @f[(array string-char (*))].
Any string-specific function defined in this chapter
whose name begins with the prefix @f[string]
will accept a symbol instead of a string
as an argument @i[provided] that the operation never modifies that argument;
the print name of the symbol is used.
@Index2[P {symbol⎇, S {coercion to a string⎇]
@Index[print name]
In this respect the string-specific sequence operations are not
simply specializations of generic versions; the generic sequence
operations described in chapter @ref[KSEQUE] never accept symbols as sequences.
This slight inelegance is permitted in @clisp in the name of pragmatic utility.
One may get the effect of having a generic sequence function
operate on either symbols or strings by applying the coercion
function @Funref[string] to any argument whose data type is in doubt.
Also, there is a slight non-parallelism in the names of string functions.
Where the suffixes @f[equalp] and @f[eql] would be more appropriate,
for historical compatibility the suffixes @f[equal] and @f[=] are used instead
to indicate case-insensitive and case-sensitive character comparison,
respectively.
Any @xlisp object may be tested for being a string by
the predicate @Funref[stringp].
Note that strings, like all vectors, may have fill pointers
(though such strings are not necessarily @i[simple]).
String operations generally operate only on the active portion of the string
(below the fill pointer). See @Funref[fill-pointer] and related
functions.
@Section[String Access]
The following functions access a single character element of a string.
@Defun[Fun {char⎇, Args {@i[string] @i[index]⎇]
@Defun1[Fun {schar⎇, Args {@i[simple-string] @i[index]⎇]
The given @i[index] must be a non-negative integer less than
the length of @i[string], which must be a
string. The character at position @i[index]
of the string is returned as a character object.
(This character will necessarily satisfy the predicate @Funref[string-char-p].)
As with all sequences in @clisp, indexing is zero-origin.
For example:
@lisp
(char "Floob-Boober-Bab-Boober-Bubs" 0) @EV #\F
(char "Floob-Boober-Bab-Boober-Bubs" 1) @EV #\l
@Endlisp
See @Funref[aref] and @Funref[elt]. In effect,
@lisp
(char s j) @EQ (aref (the string s) j)
@endlisp
@Macref[setf] may be used with @f[char] to destructively replace
a character within a string.
For @f[char], the string may be any string;
for @f[schar], it must be a simple string.
In some implementations of @clisp, the function @f[schar] may
be faster than @f[char] when it is applicable.
@Enddefun
@Section[String Comparison]
The naming conventions for these functions and for their keyword
arguments generally follow the conventions for the generic sequence
functions. See chapter @Ref[KSEQUE].
@Defun[Fun {string=⎇, Funlabel {string#&M⎇, Args {@i[string1] @i[string2]⎇, Keys = {[start1][end1][start2][end2]⎇]
@f[string=] compares two strings and is true if
they are the same (corresponding characters are identical)
but is false if they are not.
The function @Funref[equal] calls @f[string=] if
applied to two strings.
The keyword arguments @Kwd[start1] and @Kwd[start2] are the places
in the strings to start the comparison.
The arguments @Kwd[end1] and @Kwd[end2] are the
places in the strings to stop comparing; comparison stops just
@i[before] the position specified by a limit.
The start arguments default to zero (beginning of string),
and the end arguments (if either omitted or @false)
default to the lengths of the strings (end of string),
so that by default the entirety of each string is examined.
These arguments are provided so that substrings can be compared
efficiently.
@f[string=] is necessarily false if the (sub)strings
being compared are of unequal length; that is, if
@Lisp
(not (= (- end1 start1) (- end2 start2)))
@Endlisp
is true, then @f[string=] is false.
@lisp
(string= "foo" "foo") @r[is true]
(string= "foo" "Foo") @r[is false]
(string= "foo" "bar") @r[is false]
(string= "together" "frog" :start1 1 :end1 3 :start2 2)
@r[is true]
@Endlisp
@Incompatibility{@f[string=] is called @f[strequal] in @interlisp.⎇
@Enddefun
@Defun[Fun {string-equal⎇, Args {@i[string1] @i[string2]⎇, Keys = {[start1][end1][start2][end2]⎇]
@f[string-equal] is just like @f[string=] except that differences
in case are ignored; two characters are considered to be the same
if @Funref[char-equal] is true of them.
For example:
@lisp
(string-equal "foo" "Foo") @r[is true]
@Endlisp
@Enddefun
@Defun[Fun {string<⎇, Funlabel {string#&L⎇, Args {@i[string1] @i[string2]⎇, Keys = {[start1][end1][start2][end2]⎇]
@Defun1[Fun {string>⎇, Funlabel {string#&N⎇, Args {@i[string1] @i[string2]⎇, Keys = {[start1][end1][start2][end2]⎇]
@Defun1[Fun {string<=⎇, Funlabel {string#&L#&M⎇, Args {@i[string1] @i[string2]⎇, Keys = {[start1][end1][start2][end2]⎇]
@Defun1[Fun {string>=⎇, Funlabel {string#&N#&M⎇, Args {@i[string1] @i[string2]⎇, Keys = {[start1][end1][start2][end2]⎇]
@Defun1[Fun {string/=⎇, Funlabel {string#&L#&N⎇, Args {@i[string1] @i[string2]⎇, Keys = {[start1][end1][start2][end2]⎇]
These functions compare the two string arguments lexicographically,
and the result is @false unless @i[string1] is respectively
less than, greater than,
less than or equal to, greater than or equal to, or not equal to @i[string2].
If the condition is satisfied, however, then
the result is the index within the strings of the first character
position at which the strings fail to match; put another way,
the result is the length of the longest common prefix of the strings.
A string @i[a] is less than a string @i[b] if
in the first position in which they differ the character of @i[a]
is less than the corresponding character of @i[b] according to
the function @Xfunref[X {char<⎇, L {char#&L⎇], or
if string @i[a] is a proper prefix of string @i[b]
(of shorter length and matching in all the characters of @i[a]).
The keyword arguments @kwd[start1] and @kwd[start2] are the places
in the strings to start the comparison.
The keyword arguments @kwd[end1] and @kwd[end2]
are the places in the strings to stop comparing; comparison stops just
@i[before] the position specified by a limit.
The ``start'' arguments default to zero (beginning of string),
and the ``end'' arguments (if either omitted or @false)
default to the lengths of the strings (end of string),
so that by default the entirety of each string is examined.
These arguments are provided so that substrings can be compared
efficiently. The index returned in case of a mismatch
is an index into @i[string1].
@Enddefun
@Defun[Fun {string-lessp⎇, Args {@i[string1] @i[string2]⎇, Keys {[start1][end1][start2][end2]⎇]
@Defun1[Fun {string-greaterp⎇, Args {@i[string1] @i[string2]⎇, Keys {[start1][end1][start2][end2]⎇]
@Defun1[Fun {string-not-greaterp⎇, Args {@i[string1] @i[string2]⎇, Keys {[start1][end1][start2][end2]⎇]
@Defun1[Fun {string-not-lessp⎇, Args {@i[string1] @i[string2]⎇, Keys {[start1][end1][start2][end2]⎇]
@Defun1[Fun {string-not-equal⎇, Args {@i[string1] @i[string2]⎇, Keys {[start1][end1][start2][end2]⎇]
These are exactly like @f[string<], @f[string>], @f[string<=],
@f[string>=], and @f[string/=], respectively, except that distinctions between
uppercase and lowercase letters are ignored. It is as if
@Funref[char-lessp] were used instead of @Xfunref[X {char<⎇, L {char#&L⎇]
for comparing characters.
@Enddefun
@Section[String Construction and Manipulation]
Most of the interesting operations on strings may be performed
with the generic sequence functions described in chapter @ref[KSEQUE].
The following functions perform additional operations that are specific
to strings.
@Defun[Fun {make-string⎇, Args {@i[size]⎇, Keys {[initial-element]⎇]
This returns a string (in fact a simple string)
of length @i[size], each of whose characters
has been initialized to the @Kwd[initial-element] argument.
If an @Kwd[initial-element] argument is not specified, then the string will
be initialized in an implementation-dependent way.
@Implementation{It may be convenient to initialize the string
to null characters, or to spaces, or to garbage (``whatever was there'').⎇
A string is really just a one-dimensional array of ``string characters''
(that is, those characters that are members of type @f[string-char]).
More complex character arrays may be constructed using the
function @Funref[make-array].
@Enddefun
@Defun[Fun {string-trim⎇, Args {@i[character-bag] @i[string]⎇]
@Defun1[Fun {string-left-trim⎇, Args {@i[character-bag] @i[string]⎇]
@Defun1[Fun {string-right-trim⎇, Args {@i[character-bag] @i[string]⎇]
@f[string-trim] returns a substring of @i[string], with all characters in
@i[character-bag] stripped off the beginning and end.
The function @f[string-left-trim] is similar but strips characters
off only the beginning; @f[string-right-trim] strips off only the end.
The argument @i[character-bag] may be any sequence containing
characters.
For example:
@lisp
(string-trim '(#\Space #\Tab #\Newline) " garbanzo beans
") @EV "garbanzo beans"
(string-trim " (*)" " ( *three (silly) words* ) ")
@EV "three (silly) words"
(string-left-trim " (*)" " ( *three (silly) words* ) ")
@EV "three (silly) words* ) "
(string-right-trim " (*)" " ( *three (silly) words* ) ")
@EV " ( *three (silly) words"
@Endlisp
If no characters need to be trimmed from the @i[string],
then either the argument @i[string] itself or a copy of it may
be returned, at the discretion of the implementation.
@Enddefun
@Defun[Fun {string-upcase⎇, Args {@i[string]⎇, Keys = {[start][end]⎇]
@Defun1[Fun {string-downcase⎇, Args {@i[string]⎇, Keys = {[start][end]⎇]
@Defun1[Fun {string-capitalize⎇, Args {@i[string]⎇, Keys = {[start][end]⎇]
@f[string-upcase] returns a string just like @i[string] with all lowercase
characters replaced by the corresponding uppercase characters. More
precisely, each character of the result string is produced by applying
the function @Funref[char-upcase] to the corresponding character of
@i[string].
@f[string-downcase] is similar, except that uppercase characters are
converted to lowercase characters (using @Funref[char-downcase]).
The keyword arguments @Kwd[start] and @Kwd[end] delimit the portion
of the string to be affected. The result is always of the same length
as @i[string], however.
The argument is not destroyed. However, if no characters in the argument
require conversion, the result may be either the argument or a copy of it,
at the implementation's discretion.
For example:
@lisp
(string-upcase "Dr. Livingston, I presume?")
@EV "DR. LIVINGSTON, I PRESUME?"
(string-downcase "Dr. Livingston, I presume?")
@EV "dr. livingston, i presume?"
(string-upcase "Dr. Livingston, I presume?" @Kwd[start] 6 @Kwd[end] 10)
@Ev "Dr. LiVINGston, I presume?"
@Endlisp
@f[string-capitalize] produces a copy of @i[string] such that,
for every word in the copy, the first character of the word,
if case-modifiable, is uppercase and
any other case-modifiable characters in the word are lowercase.
For the purposes of @f[string-capitalize],
a word is defined to be a
consecutive subsequence consisting of case-modifiable characters or digits,
delimited at each end either by a non-case-modifiable non-digit
or by an end of the string.
For example:
@lisp
(string-capitalize " hello ") @EV " Hello "
@Tabclear
@Tabset[4]
(string-capitalize
@\"occlUDeD cASEmenTs FOreSTAll iNADVertent DEFenestraTION")
@>@EV @\"Occluded Casements Forestall Inadvertent Defenestration"
(string-capitalize 'kludgy-hash-search) @EV "Kludgy-Hash-Search"
(string-capitalize "DON'T!") @EV "Don'T!" ;@i[not] "Don't!"
(string-capitalize "pipe 13a, foo16c") @EV "Pipe 13a, Foo16c"
@Endlisp
@Incompatibility{Very approximate @interlisp equivalents to
@f[string-upcase], @f[string-downcase], and @f[string-capitalize]
are @f[u-case], @f[l-case] with second argument @nil,
and @f[l-case] with second argument @true.⎇
@Enddefun
@Defun[Fun {nstring-upcase⎇, Args {@i[string]⎇, Keys = {[start][end]⎇]
@Defun1[Fun {nstring-downcase⎇, Args {@i[string]⎇, Keys = {[start][end]⎇]
@Defun1[Fun {nstring-capitalize⎇, Args {@i[string]⎇, Keys = {[start][end]⎇]
These functions are just like @f[string-upcase],
@f[string-downcase], and @Funref[string-capitalize]
but destructively modify the argument @i[string] by altering
case-modifiable characters as necessary.
The keyword arguments @Kwd[start] and @Kwd[end] delimit the portion
of the string to be affected. The argument @i[string] is returned as
the result.
@Enddefun
@Defun[Fun {string⎇, Args {@i[x]⎇]
Most of the string
functions effectively apply @f[string]
to such of their arguments as are supposed to be
strings.
If @i[x] is a string, it is returned.
@Index2[P {print name⎇, S {coercion to string⎇]
If @i[x] is a symbol, its print name is returned.
@Index2[P {symbol⎇, S {coercion to string⎇]
If @i[x] is a string character (a character of type @f[string-char]),
then a string containing that one character is returned.
@Index2[P {character⎇, S {coercion to string⎇]
In any other situation, an error is signalled.
To convert a sequence of characters to a string, use @Funref[coerce].
(Note that @f[(coerce x 'string)] will not succeed if @f[x] is a symbol.
Conversely, @f[string] will not convert a list or other sequence
to be a string.)
To get the string representation of a number or any other @xlisp
object, use @Funref[prin1-to-string], @Funref[princ-to-string],
or @Funref[format].
@Enddefun